JSM 2023
Toronto, Canada
The University of Utah
2023-08-09
What I highlight in their paper:
Start to finish framework for multi-ERG models.
Dealing with heterogeneous samples.
Model building process.
Goodness-of-fit analyses.
Two important missing pieces (for the next paper): power analysis and how to deal with collinearity in small networks.
Two different questions: How many nodes? and “How many networks?”
Is the network bounded?
If it is bounded, can we collect all the nodes?
If we cannot collect all the nodes, can we do inference (Schweinberger, Krivitsky, and Butts 2017; Schweinberger et al. 2020)?
There is a growing number of studies featuring multiple networks (e.g., egocentric studies).
There’s no clear way to do power analysis in ERGMs.
In funding justification, power analysis is fundamental, so we need that.
We can leverage conditional ERG models for power analysis.
Conditioning on one sufficient statistic results in a distribution invariant to the associated parameter, formally:
\[\begin{align} \notag% {\mbox{Pr}_{\mathcal{Y},\boldsymbol{\theta}}\left(\boldsymbol{Y}= \boldsymbol{y}\left|\;\boldsymbol{g}\left(\boldsymbol{y}\right)_l = s_l\right.\right)}% & = \frac{% {\mbox{Pr}_{\mathcal{Y},\boldsymbol{\theta}}\left(\boldsymbol{g}\left(\boldsymbol{Y}\right)_{-l} = \boldsymbol{g}\left(\boldsymbol{y}\right)_{-l}, \boldsymbol{g}\left(\boldsymbol{y}\right)_l = s_l\right) } }{% \sum_{\boldsymbol{y}'\in\mathcal{Y}:\boldsymbol{g}\left(\boldsymbol{y}'\right)_l = s_l}{\mbox{Pr}_{\mathcal{Y},\boldsymbol{\theta}}\left(\boldsymbol{g}\left(\boldsymbol{Y}\right) = \boldsymbol{y}'\right) }% } \\ & = % \frac{% \mbox{exp}\left\{{\boldsymbol{\theta}_{-l}}^{\boldsymbol{t}}\boldsymbol{g}\left(\boldsymbol{y}\right)_{-l}\right\} }{% \kappa_{\mathcal{Y}}\left(\boldsymbol{\theta}\right)_{-l} }, \tag{1} \end{align}\]
where \(\boldsymbol{g}\left(\boldsymbol{y}\right)_l\) and \(\boldsymbol{\theta}_l\) are the \(l\)-th element of \(\boldsymbol{g}\left(\boldsymbol{y}\right)\) and \(\boldsymbol{\theta}\) respectively, \(\boldsymbol{g}\left(\boldsymbol{y}\right)_{-l}\) and \(\boldsymbol{\theta}_{-l}\) are their complement, and \(\kappa_{\mathcal{Y}}\left(\boldsymbol{\theta}\right)_{-l} = \sum_{\boldsymbol{y}' \in \mathcal{Y}: \boldsymbol{g}\left(\boldsymbol{y}'\right)_l = s_l}\mbox{exp}\left\{{\boldsymbol{\theta}_{-l}}^{\boldsymbol{t}}\boldsymbol{g}\left(\boldsymbol{y}'\right)_{-l}\right\}\) is the normalizing constant.
We can use this to generate networks with a prescribed density (based on previous studies) and compute power through simulation.
Study gender homophily in networks of size 8.
On average, the focal networks have 20 ties (i.e., a density of \((2\times 20)/(8 \times 7) \approx 0.71\)).
Want to detect an effect size of \(\boldsymbol{\theta}_{\mbox{homophily}} = 2\), we could approximate the required sample size in the following fashion:
For each \(n \in N \equiv \{10, 20, \dots\}\), do:
With Eq. (1), use MCMC to simulate \(1,000\) sets of \(n\) networks of size 8 and 20 ties.
For each set, fit a conditional ERGM to estimate \(\widehat{\boldsymbol{\theta}}_{\mbox{homophily}}\), and generate the indicator variable \(p_{n, i}\) equal to one if the estimate is significant at the 95% level.
The empirical power for \(n\) is equal to \(p_n \equiv \frac{1}{1,000}\sum_{i}p_{n, i}\).
Once we have computed the sequence \(\{p_{10}, p_{20}, \dots\}\), we can fit a linear model to estimate the sample size as a function of the power, e.g., \(n = \beta_0 + \beta_1 p_n + \beta_2 p_n^2 + \varepsilon\).
With the previous model in hand, we can estimate the sample size required to detect a given effect size with a given power.
Variance Inflation Factor [VIF] is a common measure of collinearity in regular models.
Usually, VIF > 10 is considered problematic.
Duxbury (2021)’s large simulation study recommends using VIF between 150 and 200 as a threshold for multicollinearity.
In small networks, this could be more severe.
In a directed network with 5 nodes, two of them female and three male, transitive triads are almost perfectly predicted by mutual ties.
When \(\boldsymbol{\theta}_{\mbox{ttriad}} = 1\) (second row), Cor(transitive triads, mutual ties) \(\to 1\), and VIF is > 4,500.
Krivitsky, Coletti, and Hens’ work make an important contribution to ERG models, most relevant: model building, selection, and GOF for multi-network models.
Power (sample size requirements) and multicollinearity are two important issues that are yet to be addressed.
I presented a possible approach to deal with power analysis in ERGMs using conditional distributions.
Collinearity in small networks (like those in KCH) can be serious (more than in larger networks.) Yet we need to further explore this.
Vega Yon – ggv.cl/slides/jsm2023 – The University of Utah